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ABSTRACT 


A maximal  vector  of  a set  is  one  which  is  not  less  than  any  other 

vector  in  all  components.  We  derive  a recurrence  relation  for  computing 

the  average  number  of  maximal  vectors  in  a set  of  n vectors  in  d-space 

under  the  assumption  that  all  (n'.)*^  relative  orderings  are  equally 

probable.  Solving  the  recurrence  shows  that  the  average  number  of 
d- 1 

maxima  is  0((ln  n)  ) . We  use  this  result  to  construct  an  algorithm  for 
finding  all  the  maxima  that  has  expected  running  time  1 inear  in  n (for 
sets  of  vectors  drawn  under  our  assumptions).  For  a given  set  of  random 
points,  the  result  is  also  used  to  derive  an  upper  bound  on  the  expected 
number  of  points  from  the  set  which  are  on  the  boundary  of  the  convex 
hull  of  the  set. 
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1 . INTRODUCTION 


The  problem  of  finding  all  maximal  vectors  in  a set  of  n d-vectors 
has  recently  been  studied  by  Kung,  Luccio  and  Preparata  [3]  and  F.  Yao 
[7].  In  this  paper  we  consider  the  related  problem  of  finding  the 
expected  number  of  maximal  elements  in  a given  set.  We  give  a solution 
to  that  problem  under  a very  general  probability  distribution  and  then 
apply  the  answer  to  the  solution  of  related  problems. 

A maximal  vector  is  one  which  is  not  less  than  any  other  vector  in 
all  components.  More  precisely,  we  say  that  a vector  P dominates  the 
vector  Q if  P is  greater  than  Q in  every  component;  then  a vector  is 
maximal  if  it  is  not  dominated  by  any  other  vector  in  the  set.  For 
example,  in  {(1,2,4),  (2,3,1),  (3,1,3),  (4,4,2)},  only  (2,3,1)  is  not 
maximal.  It  is  helpful  to  view  this  problem  geometrically  when  d = 2. 

In  that  case  the  vectors  can  be  considered  as  n points  in  the  plane  and 
a given  vector  is  maximal  if  and  only  if  there  is  no  point  in  its  first 
quadrant  (above  it  and  to  its  right) . 

A probability  distribution  is  implied  as  we  ask  for  the  expected 
number  of  maxima.  A mathematically  tractable  yet  reasonabel  model 
assumes  that  for  each  vector,  the  magnitude  of  one  component  is  dis- 
tributed independently  of  the  magnitude  of  the  other  components  and,  for 
each  component,  the  magnitudes  chosen  for  each  vector  are  distinct.  The 
second  restriction  implies  that  the  vectors  can  be  sorted  into  increasing 
order  on  any  component,  yielding  a relative  ordering  from  1 to  n.  Thus 

each  set  of  n d-vectors  corresponds  to  a particular  relative  ordering 

d 

for  each  component,  that  is,  to  one  of  (nl)  assignments  of  permutations 
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of  (1,2 ,3, . . . ,n)  to  the  d components.  Examples  of  multivariate  statis- 
tical distributions  with  distinct  components  distributed  independently 
include  the  multivariate  normal  and  multivariate  uniform  drawn  from  a 
unit  hypercube.  (Recall  that  elements  drawn  independently  from  any  con- 
tinuous distribution  function  are  distinct  with  probability  one.) 

The  solution  to  the  maximal  vector  problem  is  often  required  in  the 
analysis  of  the  runtime  of  dynamic  programming  algorithms  (see  Schkolnick 
[5]  and  Schkolnick  and  Thompson  [6]).  in  dynamic  programming  the  solu- 
tion to  a problem  of  size  n is  obtained  from  the  best  solutions  of  prob- 
lems of  size  n-1.  For  many  applications  a cost  vector  of  length  one  is 
sufficient,  i.e.,  there  is  a single  best  solution  to  all  subproblems. 

In  cases  where  more  than  one  best  solution  must  be  retained  for  each  sub- 
problem, it  may  still  be  possible  to  design  a multidimensional  cost  func- 
tion with  the  property  that  the  best  solutions  for  every  subproblem  are 
just  the  maximal  ones.  If  the  cost  vectors  of  candidate  solutions  are 
assumed  to  have  the  proper  distribution,  then  the  maximal  vector  problem 
indicates  the  expected  number  of  best  solutions. 

In  Section  2 we  formulate  and  solve  a recurrence  that  shows  that  the 
expected  number  of  maxima  among  n d- vectors  is  0((ln  n)'^  ^)  . We  use  this 
result  in  Section  3 to  give  an  algorithm  for  finding  all  the  maxima  of  a 
set  of  n d-vectors  that  has  expected  running  time  linear  in  n.  In  Section 
4 we  show  for  a given  set  of  random  points,  how  this  result  gives  an  upper 


bound  on  the  expected  number  of  points  from  the  set  which  are  on  the 
boundary  of  the  convex  hull  of  the  set. 
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2.  DETERMINING  THE  AVERAGE  NUMBER  OF  MAXIMA 

In  this  section  we  derive  the  primary  result  of  this  paper.  We 
give  two  derivations  of  this  result.  Our  first  derivation  is  formal  and 
therefore  rather  complicated,  so  we  supplement  that  with  a second,  in- 
formal derivation.  The  second  derivation  is  not  completely  precise,  but 
it  does  give  an  intuitive  idea  of  the  essential  workings  of  the  formal 
derivation.  We  proceed  directly  with  the  formal  derivation;  the  informal 
begins  immediately  after  the  statement  of  Theorem  2. 

Let  A(n,d)  be  the  average  number  of  maximal  vectors  out  of  n d- vectors. 
Without  loss  of  generality,  assume  that  the  vector  components  in  each  dimen- 
sion are  integers  from  1 to  n.  We  shall  therefore  view  a set  of  n d-vectors 
as  a d by  n array,  whose  rows  are  the  vectors  and  whose  columns  are  permuta- 
tions of  [l,2 n}.  Let  S be  the  set  of  all  such  arrays.  Then  S contains 

(n'.)*^  arrays.  For  any  array  r in  S,  let  M(r)  denote  the  number  of  maximal 
vectors  in  r.  By  the  definition  of  A(n,d),  we  have 

lw(r) 

A(n,d)  = . . 

(n'.)'* 

Let  T be  a subset  of  S which  consists  of  arrays  with  their  first  columns  equal 
T 

to  (l,2,...,n)  . Because  M(r)  is  invariant  under  permutations  of  the  rows  in 
r,  it  follows  that 

rfr  r€S 

Thus , 

(2.1)  (n'.)‘^"^A(n,d)  = ^ M(r)  . 

r€T 
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Any  array  r in  T can  be  decomposed  as 


T 

where  B = (2,3,...,n)  . (Region  1 contains  only  one  vector,  namely,  (1,A^).) 
For  i = 1»2,  define  M^(r)  to  be  the  number  of  maximal  vectors  in  r which 
are  in  region  i.  Thus, 

(2.2)  M(r)  = M^(r)  + M2(r) . 

(Jote  that  Mj^(r)  is  either  zero  or  one.)  Taking  sums  in  both  sides,  we  have 
by  (2.1), 

(2.3)  (n'.)‘^"V(n,d)  = ”l(r)  + 

r€T  r€T 

Lemma  1 

I = i (n'.)‘^‘^A(n,d-l). 

r£T 

Proof 

Note  that  2^  number  of  times,  over  all  r in  T,  vector  (1,A^) 

r€T 

is  maximal  in  r.  Note  also  that  vector  (1,A^)  is  maximal  over  all  n d-vectors 
in  r if  and  only  if  vector  A^  is  maximal  over  all  n (d-l)-vectors  in  the  sub- 


i 


array 


• Then  ^ M^(r)  is  simply  the  number  of  times,  over  all  r in  T,  the 


r€T 


vector  A in  the  firs  row  is  maximal  in 
r 

d-1. 


L^r 


Since  over  all  r in  T there 


are  (nl)  A(n,d-1)  maximal  (d-l)-vectors  and  the  number  of  maximal  vectors 
occurring  in  each  row  is  the  same,  it  follows  that 


^ Mi(r)  = ^(ni)‘^’^A(n,d-l).  □ 

r0T 

Lemma  2 

= (nl)*^'^A(n-l,d). 

r€T 


Proof 

Consider  an  array  r in  T.  Note  that  a d-vector  in  region  2 is  maximal 
over  all  n d-vectors  in  r if  and  only  if  it  is  maximal  over  all  n-1  d-vectors 
in  region  2.  Therefore,  for  any  r in  T we  shall  consider  only  region  2. 

^ M2(r)  is  the  number  of  occurrences  of  maximal  vectors  in  region  2 over  all 
T.  Let  A be  a fixed  (d-l)-vector.  Over  all  r in  T where  r has  (1,A) 
as  its  first  row,  there  are  ((n-l)'.)*^  ^A(n-l,d)  maximal  d-vectors  occurring 
in  region  2.  Since  A may  be  chosen  in  n*^  ^ ways,  we  have 

^M2(r)  = (n'.)‘^'^A(n-l,d).  □ 

rCT 


The  following  theorem  follows  from  (2.3)  and  Lemmas  1 and  2, 


Theorem  1 

(2.4)  A(n,d)  =•  A(n-l,d)  + 

n 


for  n,  d 2 2 . 


It  is  easy  to  check  that 


(2.5)  A(l,d)  =■  1 for  d 2 1, 


(2.6)  A(n,l)  = 1 for  n 2 1. 


The  recurrence  (2.4)  with  initial  conditions  (2.5)  and  (2.6)  can  be  solved 
by  first  setting  up  the  generating  functions 

G^(z)  = ^ A(n,d+l)z*^,  for  n 2 1. 

d20 


By  (2.4)  and  (2.6), 


G„(z) 


1 + n 

n-  i n n 


,(z) 

(2.7)  G^(z)  = ^ 

" 1-1 
n 


for  n ^ 2.  By  (2.5),  G^^(z)  = l/(l-z) . Hence  (2.7)  implies  that 


G^(z)  = n — - 

isi^  1-1 
1 


which  is  Eq.  (33)  of  Section  1.2.9  in  Knuth  [1]  with  = l/i.  Define 


(n)  = l+  — + — . 

2^  3^  n’^ 


Knuth’ s analysis  shows  that  the  coefficient  of  z in  G (z)  is 

n^ 


h k 

k ,k  , ...,k  20  1 ^k  ’.  2 ^k,'. 

k,+2k„+...+3k^=d  ^ ^ 

1 z a 


d 


A(n,2) 

A(n,3) 

A(n,4) 


- , which  is  the  nth  harmonic  number  H , 

n 


= 1 «(1) 


1 ^(2) 


= j + I H^"'(n), 


= (n)^  + ^ (n)  (n)  + ^ (n)  , 


It  is  not  difficult  to  show  that  the  sum  of  the  coefficients  in  A(n,d) 
is  always  1.  (For  example,  the  sum  of  the  coefficients  in  A(n,4)  is 
+ ^ + -r  = 1.)  Since  (n)  ^ (n)^  for  n,  r s 1,  we  have 

(2.9)  A(n,d) 


By  (2.8)  and  (2.9)  we  have 


Theorem  2 


.A(n,d) 

(d-1) ; 


Therefore,  A(n,d)  = 0((ln  n)*^  ^)  for  fixed  d. 

We  now  give  a more  intuitive  derivation  of  the  recurrence  for 
A(n,d) . As  we  stated  previously,  this  derivation  is  not  precise,  but 
it  should  help  in  getting  an  intuitive  idea  of  the  workings  of  the 


i 

i 

J 


previous  proof.  To  compute  the  expected  number  of  maxima  in  a set  we 
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will  consider  the  set  sorted  in  order  oy  the  first  coordinate,  (As  before, 
we  consider  that  all  numbers  have  been  translated  to  the  integers  from  1 to. 
n.)  The  situation  we  now  have  is  illustrated  in  the  following  figure. 


We  now  ask  what  is  the  probability  that  the  i-th  vector  in  the  set  is  a 
maximum?  Since  its  first  coordinate  is  greater  than  the  first  coordinates 
of  the  1-st  through  the  (i-l)-st  vectors,  it  cannot  be  dominated  by  any  of 
those.  Therefore  the  i-th  vector  is  a maxima  if  and  only  if  its  remaining 
d-1  coordinates  are  maximal  in  the  set  of  the  i-th  through  the  n-th  vectors. 

The  probability  that  the  i-th  vector  is  a maximal  in  this  set  is,  by  indepen- 
dence, the  expected  number  of  maxima  in  the  set  (which  is  A(n-i+l,d-l)) 
divided  by  the  total  number  of  vectors  in  the  set  (which  is  n-i+D.  Since  these 


probabilities  are  indepndent  for  all  values  of  i,  to  find  the  expected  number 
of  maxima  in  the  set  we  sum  the  probabilities  of  each  vector  being  maximal 


and  we  have 


A(n,d) 


A(n-  i+1  ,d-  1) 
n-  i+1 
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Notice  that  the  last  sura  is  equivalent  to  the  expression  for  A(n,d)  in 
Theorem  1 . 

We  now  give  a simpler  (and  less  precise)  bound  on  the  growth  of  A(n,d) 
It  is  obvious  that  A(n,d)  must  be  monotone  increasing  in  n,  so  if  j ^ n 
then  A(j,d)  S A(n,d) . We  use  this  observation  in  the  following  derivation. 


A(n,d) 


A(j,d-1) 

j 


A(n  ,d- 1) 


j=l  -J 
= A(n,d-1) 


A(n,d-1)  H*-^\n). 


Iterating  this  recurrence  on  d easily  gives  the  upper  bound 


A(n,d)  i H^^\n)'^"^, 
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3.  A FAST  EXPECTED  TIME  MAXIMA  ALGORITHM 

So  far  in  this  paper  we  have  considered  the  problem  of  counting  the  num- 
ber of  maxima  in  a set  of  vectors;  a related  problem  is  finding  the  maxima  in  a 
set  of  vectors.  This  problem  has  received  much  attention  recently.  Rung, 

Luccio  and  Preparata  [3]  give  an  algorithm  for  finding  the  maxima  of  n vectors 

in  d- space  that  has  worst-case  running  time  of  0(n  In  n)  for  d = 2 and 
d 2 

0(n(ln  n)  ) for  d ^ 3.  F.  Yao  [7]  shows  that  the  results  in  2 and  3-space 
are  optimal  by  giving  a worst-case  lower  bound  of  0(n  In  n)  (indeed,  she  gives 
a bound  that  is  the  exact  number  of  comparisons  taken  by  a known  algorithm  for 
planar  sets) . 

These  results,  however,  deal  only  with  the  worst-case  complexity  of  find- 
ing maxima;  it  is  often  interesting  to  consider  also  the  average-case  complexity. 
In  this  section  we  will  use  Theorem  2 and  a general  divide  and  conquer  schema  to 
give  a fast  expected  time  algorithm  for  finding  maxima  (this  schema  is  invest- 
igated in  detail  by  Bentley  and  Shamos  [1]).  The  algorithm  we  develop  here 
will  have  expected  running  time  linear  in  n for  vector  sets  drawn  under  the 
"Independent  and  Distinct"  assumptions  stated  in  Section  1. 

Our  maxima  algorithm  is  easily  described  recursively.  Without  loss  of 
generality,  we  assume  that  n is  a power  of  two.  To  find  the  maxima  of  a set  S 
of  n vectors,  divide  S into  two  sets  A and  B,  each  containing  n/2  vectors. 

Recursively  find  the  maxima  of  A and  B,  calling  those  sets  M^  and  M^,  respec- 
tively. It  is  easy  to  see  that  the  set  of  maximum  vectors  of  S is  the  set  of 

maxima  of  M^  G M^.  Therefore  we  can  find  all  the  maxima  of  S by  finding 

the  maxima  of  M^  " ''b  ’ this  we  use  the  algorithm  of  Rung,  Luccio  and 


Preparata  [3].  (Recursion  in  cur  original  algorithm  stops  when  n is  less  than 
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some  predefined  constant.)  The  division  into  subproblems  can  be  implemented 
on  a random  access  computer  by  storing  the  vectors  in  a d by  n array  of  scalar 
values.  Each  vector  is  initially  represented  as  a pair  of  integers  which 
define  the  left  and  right  endpoints  of  a segment  in  the  array.  Division  into 
further  subsets  can  be  accomplished  by  taking  the  arithmetic  mean  of  the  end- 
points as  defining  two  new  segments,  etc.;  note  that  the  division  preserves 
randomness  and  can  be  accomplished  in  constant  time. 

The  expected  running  time  of  this  algorithm  is  easy  to  analyze,  given  that 
the  expected  number  of  maxima  in  a set  of  n d-vectors  is  0((ln  n)*^  ^)  . Since 
division  into  subproblems  can  be  accomplished  in  constant  time,  the  recurrence 
describing  the  expected  running  time  of  our  algorithm  on  n d-vectors  is 

(3.1)  T(n,d)  = 2T(n/2,d)  + F(n,d) 

where  F(n,d)  is  the  expected  running  time  of  the  marriage  step  (finding  the  maxima 

of  U Mg) . Let  i be  the  number  of  vectors  in  U Mg.  Then  the  running 

time  of  the  marriage  step  using  the  algorithm  of  Kung,  Luccio  and  Preparats  is 

d-2 

bounded  above  by  0(i(ln  i)  ) for  d S 3.  This  gives 

n 

F(n,d)  s ^ p(i)  • i(ln  i)*^"^ 
i=l 

where  p(i)  is  the  probability  of  there  being  exactly  i maxima  in  'J  Mg.  By 
the  fact  that  the  number  of  maxima  in  A is  independent  of  that  in  B,  the  expect- 
ed value  of  i satisfies: 
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n 

E(i)  “ ^ p(i)  • i 
i=l 

= 2» (expected  number  of  maxima  in  a set  of  n/2  d-vectors) 

- 2 • 0((ln 
= 0((ln  n)‘^'S. 

Therefore,  by  In  i i In  n,  we  have 

n 

(3.2)  F(n,d)  S (In  n)*^"^  ^ p(i)  • i 

i=l 

s2d-3^ 

= 0((ln  n)  ) . 

Substituting  (3.2)  into  (3.1)  gives  for  the  running  time  of  our  algorithm  the 
recurrence 

T(n,d)  S2T(n/2,d)  +0((ln  n)^'^"^)  . 

For  fixed  d,  this  recurrence  is  well  known  to  have  the  solution 

T(n,d)  = 0(n) . 

In  addition  to  having  a very  fast  expected  running  time,  our  algorithm 

also  has  quite  a respectable  worst-case  performance.  Note  that  F(n,d)  is  always 

d*  2 

bounded  above  by  0(n(ln  n)  ) for  d S3,  so  the  worst-case  running  time  of 
our  algorithm  is  given  by 

T(n,d)  =■  2T(n/2,d)  + 0(n(ln  n)*^"^) 
which  has  the  solution 

T(n,d)  = 0(n  (In  n)'^"^)  . 


Thus  in  the  worst-case  our  algorithm  is  only  a factor  of  In  n slower  than  the 
best  known  worst-case  algorithm. 

We  summarize  Che  main  result  of  this  section  in  the  following  theorem. 
Theorem  3 . 

The  maxima  of  a set  of  n d-vectors  drawn  from  a distribution  satisfying  the 
"Independent  and  Distinct"  property  can  be  found  in  expected  time  linear  in  n. 
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4.  RELATION  TO  CONVEX  HULLS 

The  maximal  elements  of  a set  of  vectors  are  a crude  representation  of 
the  boundary  of  the  set;  the  boundary  can  be  more  precisely  defined  as  the 
boundary  of  the  convex  hull  of  the  set.  IJhile  working  with  the  convex  hull 
we  will  view  the  vectors  as  points  in  d-space.  The  convex  hull  of  the  n points 
is  then  defined  as  the  smallest  convex  set  containing  the  n points.  One  can  get 
an  intuitive  picture  of  the  convex  hull  of  a planar  point  set  by  imagining  the  n 
points  as  n nails  in  a large  board,  with  about  an  inch  of  each  nail  remaining 
above  the  board.  The  convex  hull  of  this  set  can  be  found  by  taking  a large 
rubber  band,  stretching  it  infinitely  far  out  in  all  directions,  then  letting  it 
go.  It  will  come  to  rest  about  certain  of  the  nails,  and  the  region  within  the 
rubber  band  is  the  convex  hull  of  the  set. 

Given  a set  of  n points  sampled  independently  from  some  underlying  proba- 
bility distribution  function  in  d-space,  what  is  the  expected  number  of  points 
on  the  resulting  convex  hull?  (Here  we  use  the  abbreviation  "on  the  convex 
hull"  to  mean  "on  the  boundary  of  the  convex  hull".)  The  answer  to  this  ques- 
tion is  of  course  dependent  of  n,  d,  and  the  underlying  distribution.  Santalo 
[4]  describes  a number  of  results  for  different  distributions;  many  of  these 
results  and  their  original  references  are  given  in  Bentley  and  Shamos  [1].  In 
this  section  we  will  give  an  upper  bound  on  the  number  of  hull  points  for  dis- 
tributions satisfying  our  requirement  of  independence  among  the  d variables. 

To  arrive  at  this  bound  we  will  first  show  that  every  convex  hull  point  is  a 

d 

maximum  under  at  least  one  of  the  2 possible  different  assignments  of  + and  - 
signs  to  the  d components,  and  then  use  this  fact  and  Theorem  2 to  bound  the 
expected  number  of  hull  points. 
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To  show  that  every  convex  hull  point  is  a maximum  under  at  least  one  of 
the  assignments  of  + and  - signs,  assume  that  there  is  some  hull  point  h which 
is  not.  This  implies  that  there  is  at  least  one  point  in  each  of  h's  2^ 
orthants ; choose  one  point  from  each  orthant  and  call  this  collection  P.  Because 
values  are  distinct,  all  points  in  P are  properly  contained  in  their  orthants. 
Consider  now  the  convex  hull  of  P;  it  must  properly  contain  h.  (If  it  contained 
all  the  points  of  P and  not  h,  then  it  would  not  be  a convex  set.)  Since  h is 
properly  contained  in  the  convex  hull  of  P it  must  also  be  properly  contained 
in  the  convex  hull  of  the  original  set.  This  contradicts  our  assumption  and 
establishes  the  desired  fact. 

We  have  shown  that  every  hull  point  is  a maximum  under  at  least  one  of  the 
d 

2 possible  assignments  of  + and  - signs  to  the  d variables.  Consider  now  the 
set  of  all  points  that  are  maximal  under  at  least  one  of  the  sign  assignments; 
call  this  set  M.  Since  the  expected  size  of  M is  bounded  above  by 
2*^  • 0((ln  n)^  ^) , and  M contains  all  convex  hull  points,  the  expected  number 
of  convex  hull  points  is  certainly  bounded  above  by  that  expression.  Thus  we 
have  the  following  theorem. 

Theorem  4 . 

The  expected  number  of  convex  hull  points  in  a point  set  of  n points  in  d 
dimensions  satisfying  the  "Independent  and  Distinct"  property  is  bounded  above 
by  0(  (In  n)  ^)  . 
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